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Abstract 

The necessity to secure the communication between hardware compo- 
nents in embedded systems becomes increasingly important with regard 
to the secrecy of data and particularly its commercial use. We suggest 
a low-cost (i.e. small logic-area) solution for flexible security levels and 
short key lifetimes. The basis is an approach for symmetric key exchange 
using the synchronisation of Tree Parity Machines. Fast successive key 
generation enables a key exchange within a few milliseconds, given realis- 
tic communication channels with a limited bandwidth. For demonstration 
we evaluate characteristics of a standard-cell ASIC design realisation as 
IP-core in 0.18^-technology. 

1 Introduction 

For embedded systems like handheld devices, smartcards, mobiles or other wire- 
less communication devices security concepts need to be developed, in order 
to keep privacy and still (commercially) exploit the merits of such devices in 
widespread and everyday use j^j. This is, for example, of particular interest for 
the smartcard- or RFID-industry, where the secrecy of data is directly linked 
to the commercial prosperity of a product. Also, the economic importance to 
secure information technology applications in the automotive area is becoming 
eminent along with the protection of firmware, access control, anti-theft pro- 
tection, up to scenarios like the hacking of vital vehicle functions such as an 
antilock braking system (see e.g. |17p. 

Yet, the often relatively small size and severe power consumption constraints 
of these devices limit the available size for additional cryptographic hardware 
components 0] 1271 128| . This holds in particular for sensor networks, RFID- 
systems and near field communication devices. Secure hardware is thus es- 
pecially demanded for ubiquitous and pervasive computing, and the need and 
research efforts manifest in first conferences on security in pervasive computing 
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Hardware-cryptosystems are often based on hard-coded secret keys as the 
basic secret. It is good common practice to obey the often cited Kerckhoffs 
Principle ^U] C no security through obscurity') and not base the security of 
a crypto-system on the secrecy of the device or algorithm it employs. The 
security of a system is thus only as strong as the secrecy of the (fixed) keys. 
But some of the most effective attacks on a crypto-system involve no ciphertext 
analysis but instead find flaws in the key-management. Furthermore, insecure 
bus communication as reported in ,7J (regarding the video game console market), 
allows attacks still above the chip level by sniffing internal buses. In embedded 
system environments, functions are being realized (at least partly) in hardware 
and often lack online system access. The changing of a fixed key, as any other 
security update, is difficult or even impossible - i.e. too expensive. 

The exchange of a common secret key over a public channel is dominated 
by methods based on number theory since the invention of the Diffie-Hellman 
key exchange protocol in 1976 [H]. Computational security is based on the dif- 
ficulty of the discrete logarithm problem in El Gamal 6 , which is considered 
as difficult as the factorisation problem of a product of long prime numbers as 
in RSA [201 ■ Such asymmetric algorithms need to perform a lot of computa- 
tional intensive arithmetics on typically limited embedded microcontrollers. In 
a particular GSM mobile phone, for example, two algorithms are combined to 
meet performance requirements: an asymmetrical algorithm with a 1024 bit key 
for key exchange and a symmetrical algorithm using only 128 bit for the key 
and voice encryption [23] • This also demonstrates the often necessary tradeoff 
between the level of security and the available resources. 

The state-of-the-art, regarding applications in embedded systems, is repre- 
sented by Elliptic Curve Cryptography and the generalisation to Hyper- Elliptic 
Curves (see e.g. [T8]). Without a reduction of the security, these representations 
allow to reduce the size of the numbers to calculate with. Yet, more complex 
expressions need to be calculated. After all, a (frequent) key exchange is often 
of prohibitive cost, especially in the often changing topology of pervasive or 
ad- hoc networks. 

In this paper we present a small hardware solution for secure data exchange 
with flexible security levels and short key lifetimes. It is based on a fast succes- 
sive key generation and exchange process. We use a hardware-friendly algorithm 
for secure symmetric key exchange by synchronisation of socalled Tree Parity 
Machines HJ. We define architectures, using this key exchange concept, that 
allow fast successive key generation and exchange. The key exchange ranges 
within milliseconds for realistic channels and can be performed in parallel (or 
multiplexed) to encryption and the encrypted communication process. Addi- 
tionally, we provide the architectures with a flexible rekeying functionality to 
enable full exploitation of the achievable exchange rates. This particularly in- 
creases the cost for a successful immediate (online) attack, as opposed to a 
subsequent (offline) analysis on recorded information. Our focus is on secure 
data exchange between hardware components in embedded systems like RAM, 
FLASH-type ROM, (co-)processors and on bus-communication in general. Envi- 
ronments in which security can also be of moderate concern are also considered. 
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In the following, we introduce the neural network structure and a learning 
algorithm (section [21 , also in order to already point out advantageous proper- 
ties for a hardware realization. The synchronization effect leading to the key 
exchange property is explained. Algorithmic security implications on the re- 
alization of our architecture are described in section [3J Section 0] comprises 
the architectural design of the proposed hardware component with its rekeying 
functionality. Here, we also refer to design decisions prepared in the previous 
section. In section \5\ we present results from an FPGA and an ASIC imple- 
mentation on silicon area, possible clock and key exchange rates (throughput). 
We conclude the paper in section [S] with a short summary and an outlook on 
possible further extensions also referring to current research activities. 



2 Tree Parity Machines for Key Exchange 

In [U], Kinzel et. al. proposed a symmetric key exchange method based on the 
fast synchronisation of two identically structured Tree Parity Machines (TPMs). 
The particular tree structure has non-overlapping binary inputs, discrete weights 
and a single binary output as depicted in Fig. Studying interacting neural 




Figure 1: (a) The tree parity machine (TPM) generates a single output - the 
parity of the outputs of the hidden units, (b) For mutual learning, outputs on 
commonly given inputs are exchanged between the two parties A and B. 



networks in general (cf. |14lllll l8*]V the authors focused the phenomenom of fast 
synchronization by mutual learning TPMs and its potential for a cryptographic 
approach, not involving large numbers and methods from number theory. Their 
exchange protocol is realized implicitly by a mutual adaptation process between 
the two parties A and B, not involving large numbers and methods from number 
theory. 
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2.1 Structure of a Tree Parity Machine 



In the following, we describe the implemented parallel-weights version using 
hebbian learning (cf. |§])- Weights are identical in both TPMs after synchroni- 
sation. The anti-parallel-weights version, using anti-hebbian learning and lead- 
ing to inverted weights at the other party, is omitted for brevity. The notation 
A/B denotes equivalent operations for the parties A and B. A single A or B 
denotes an operation which is specific to one of the parties. 

The TPM (see Fig. QJi) consists of K hidden units (1 < k < K) in a single 
hidden-layer with non-overlapping inputs (the tree structure) and a single unit 
in the output-layer. 

Each hidden unit receives different N inputs (1 < j < N), leading to an 
input field of size K ■ N. The vector-components are random variables with 
zero mean and unit variance. The output A/B {t) € { — 1,1}, given bounded 
weights w^j B (t) G [-L, L] C Z (from input unit j to hidden unit k) and common 
random inputs Xkj(t) G { — 1, 1}, is calculated by a parity function of the signs 
of summations: 

K K K / N \ 

o A/B (t)=n^ /B (*)=n^ /B (*))=ii ff £< fl (*)^(*) ■ (i) 

fe=l fc=l fc=l \j=l / 

The common random inputs can also be kept secret between the parties, yield- 
ing authentication (see Section l?!^) l . a is a party-specific modified sign- function, 
that defines an agreement between the two parties on an opposite sign in case 
of a sum a A/B (t) G Z of zero: 

[-1 , a fe ' (t) <0 V a fe (t) = . 

From the communicated output, the outputs of the hidden units cannot be 
uniquely determined. There are multiple combinations for a signed or unsigned 
output, depending on the number of hidden units K. 



2.2 Key Exchange by Mutual Learning and Synchronisa- 
tion 

The so-called bit package variant was chosen for implementation (cf. [S]). Due 
to an reduction of (physical) output exchanges by an order of magnitude, it 
is advantageous for practical communication channels with a certain protocol 
overhead. 

Parties A and B start with an individual randomly generated initial weight 
vector w£j B (t ) - their secret. After a set of b > 1 presented inputs, where b 
denotes the size of the bit package, the corresponding b TPM outputs (bits) 
A/B (t) are exchanged over the public channel in one package (see Fig. [IJd). 
The b sequences of hidden states y^ /B (t) G {—1, 1} are stored for the subsequent 
learning process. 

A hebbian learning rule is applied to adapt the weights, using the b outputs 
and b sequences of hidden states. They are changed only on an agreement on the 
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parties' outputs. Furthermore, only weights of those hidden units are changed, 
that agree with this output: 



w^ B (t - 1) + O a ' b (t) x kj (t) ,0 A (t) = B (f ) A 
««'(*):=< O*' B (t)y£' B {t)>0 (3) 



kj 



(t — 1) , otherwise. 



Updated weights are bound to stay in the maximum range [— L, L] C Z by 
reflection onto the boundary values 

wtr(t) . = Ui g n( W tr(t))L ,KT«|>£ (4) 
l_i»^ B '(t) , otherwise. 



In iterating the above procedure, each component of the weight vectors per- 
forms a random walk with reflecting boundaries. This implies a trajectory in 
a weight space of (2L + 1)™ points. Two corresponding components in w A ^{t) 
and w B j(t) receive the same random component of the common input vector 
Xkj(i). After each bounding operation (Eq. 0J, the distance between the com- 
ponents is successively reduced to zero. Synchrony is achieved when both parties 
have learned to produce each others outputs. They remain synchronised (see 
learning rule Eq. ©) and continue to produce the same outputs on every com- 
monly given input. This effect in particular leads to common weight- vectors 
in both TPMs in each of the following iterations. These weights have never 
been communicated between the two parties and can be used as a common 
time-dependent key for encryption and decryption respectively. Such secret key 
agreement based on interaction over a public insecure channel is also discussed 
under information theoretic aspects by Maurer especially with regard to 
unconditional security. Furthermore, synchrony is achieved only for common in- 
puts. Thus, keeping the common inputs secret between A and B can be used to 
have an authenticated key exchange. There are 2 KN — 1 possible inputs in each 
iteration, yielding as many possible initialisations for a pseudo random num- 
ber generator. Shamir et al. conferred to such a synchronization over multiple 
rounds as a gradual type of Diffie-Hellman key exchange because DifRc- 
Hellman has a single round that transmits several bits. Obviously, a test for 
synchrony cannot practically be defined by checking whether weights in both 
nets have become identical. One therefore tests on successive equal outputs in 
a sufficiently large number of iterations t min , such that equal outputs by chance 
are excluded: 

Vt 6 [*V" ,t' + t min ] : A {t) = B {t) . (5) 

The number of outputs (bits) required to achieve synchronisation is lower than 
the size of the key |16j . Synchronisation time is finite for discrete weights. It is 
almost independent on N and scales with IniV for very large N. Furthermore, 
it is proportional to L 2 |lo|. Our investigations/experiments confirmed that 
the average synchronisation time is distributed and peaked around 400 for the 
parameters given in [5] . 
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3 Security and Rekeying Functionality 



The symmetric key-exchange protocol can generate long keys by fast calculations 
and building the secure channel is of linear complexity. It scales with the size 
K ■ N of the TPM structure HJ, which defines the size K ■ N ■ L of the key. 
In order to still allow comparisons with the literature we chose L = 4 for our 
implementation. The time to synchronise roughly doubles in comparison to 
L = 3, while for the attacker the same time increases by orders of magnitude 
(see EU)- 

The security of the key exchange manifests in algorithm-specific properties 
and can be fully exploited by appropriate hardware design. Next to other general 
(algorithmic) aspects on the security of the exchange method as described in 
HIIll , the tracking of the weights is hard in comparison to synchronisation - 
practically even harder when implemented in hardware. 

The key exchange protocol has been attacked by several eavesdropping ap- 
proaches, which always require full knowledge of the TPM structure in use. We 
will describe their basic properties in order to clarify the security aspect. Due to 
the nature of the key exchange and its attacks, only probabilistic definitions of a 
'successful attack' can be provided. One can distinguish between two classes of 
attacks. The first class comprises attacks, that can be defeated by appropriately 
increasing the parameter L. The consequence is, that the learning time of an 
attacker is significantly longer than the synchronisation time. The security in- 
creases proportional to L? while the probability of a successful attack decreases 
exponentially with L JS| • Among these so defeatable attacks, which try to syn- 
chronise faster than the two parties jo], are the Naive Attack, that uses a single 
or an ensemble of several identically structured TPMs. The Genetic Attack even 
comprises a population of thousands of TPMs, whose internal representations 
are optimised by a genetic algorithm ^2] ■ A successful attack is defined here as 
synchronising faster than the parties A and B and could be realized for K = 2 
in 50% of all cases. But, already for L = 3, this attack has shown to be less 
effective than the Flipping Attack. The complexity of such attacks (especially 
in hardware), with hundreds or thousands of TPMs plus an additional (genetic) 
algorithm, is obviously high. The Flipping Attack defines a successful attack as 
having 98% overlap with the weights of one party, when parties A and B are 
already synchronous. For L > 2, an ensemble of 10000 Flipping Attackers was 
found less effective than a single attacker, which revokes its practical use 

All of the previously sketched attacks can be made arbitrarily costly and thus 
practically defeated by increasing L, which significantly decreases the probability 
of a successful attack. The approach thus remains computationally secure for 
sufficiently large L PTH23) . 

The only attack, which does not seem to be affected by an increase of L 
(but still by an increase of K) is the socalled Majority Flipping Attack. It uses a 
hundred of coordinated and communicating TPMs |25j . Yet, the given definition 
of a successful attack is problematic: When A and B have fully synchronised, 
the attacker has 98% average overlap (i.e. a fraction) with the weights, in 50% 
of all cases. For 99% average overlap, the probability reduces to 25%. This 
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indicates that the difficulty lies in achieving the last percents. The authors 
chose this definition, because of the strong fluctuations they observed in the 
success probability. But the definition of overlap is an average overlap over all 
hidden units. Thus an attacker does not know, which of the K ■ N components 
of the weights (the key) are correct in a real attack scenario. In currently used 
symmetric encryption algorithms, the flipping of a single bit only already leads 
to a complete failure in decryption. Thus practically, one still has to perform a 
subsequent brute-force attack, which would then only be successful in 50% of all 
cases. Keep in mind, that A and B already have one key (while the attacker has 
98% with a probability of 0.5) and start encryption and data transmission. The 
attacker needs to perform his brute force attack plus the attack on the encrypted 
data in parallel. Furthermore, the rekeying principle and the achievable short 
key lifetimes (cf. Section f5TTft aim at an online usage of the exchanged keys for 
secure transmission. 

All formulated attacks can hardly be performed online. Only an offline 
attack on the previously recorded exchanged information seems realistic. Last 
but not least, note that all of the existing attacks are based on knowing the 
common inputs and thus refer to a non-authenticated key exchange, in which 
man-in-the-middle attacks are possible as well. 

3.1 Feasible Immediate Rekeying 

We propose to minimise the key lifetime as much as possible employing imme- 
diate rekeying, that allows to exploit the speed of key exchange and features of 
our hardware component. Such a rekeying process normally is to be avoided 
due to the computational cost of a new key exchange. Strategies are developed 
to increase the key lifetime without affecting the security (see e.g. pQ). Yet, 
using the TPM principle allows for efficient rekeying in the fcHz-range (see also 
Section EJ . 

Next to several other propositions (cf. 9 ) concerning the en-/decryption, 
one particular proposition (cf. [lip is to take each (common) weight vector 
after synchronisation for en-/decryption. On the one hand, a new potential 
key is present in each step, which can then be used block-wise. On the other 
hand, an opponent then also has the chance to synchronize using the ongoing 
communication (cf. Section and get a key. 

We suggest to permanently generate (i.e. synchronise) new keys in parallel or 
multiplexed to the encryption-transmission-decryption of data, using the most 
recent key exchanged. In this case, the key is only used to encrypt a certain small 
subset of the plaintext. As soon as a new key has been exchanged, it is used 
for encryption. This especially allows to realize short key lifetimes, enabling a 
certain security level by many smaller keys instead of one large key. 

Consequently, in our hardware design, we allow an external unit to demand 
a key exchange service. Our TPMRAs will continuously synchronise new keys, 
as long as data needs to be exchanged. Once a crypt-unit uses the first key, 
synchronisation is triggered again to always provide a next key. In this way, 
the related hardware resources are consequently used and keys are exchanged 
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at a maximum rate subject to hardware constraints, average synchronisation 
time and available channel bandwidth. Furthermore, it allows to implement 
services like periodic or even adaptive rekeying. Security is thus increased in 
our hardware implementation through feasible immediate rekeying, the mere 
speed of key exchange and the achievable short key lifetimes. 

4 Tree Parity Machine Rekeying Architectures 

It is important to note that, with respect to a hardware implementation, only 
signs and bounded integers are processed within the algorithm. The result of the 
outer product in Eq. Q can be realized without multiplication. The product 
within the sum is only changing the sign of the weight. Thus, the most complex 
structure to be implemented is an adder. The complexity of such a unit is thus 
even less than the complexity of a linear filter, which requires a full multiply- 
accumulate structure. Yet, the inherent parallelism can be exploited here as 
well. The branches in Eq. are only based on a test for the sign or a test on 
equality to zero, also easily done in hardware. 

Furthermore, only sign-operations and additions are present in the learning 
rule (Eq. ©, well suited for a hardware implementation. The bit package ex- 
change can either be realized serially or via a parallel bus, depending on the users 
requirements and the intended application. The amount of registers needed for 
storage increases in the bit package variant, finally imposing a tradeoff area vs. 
speed. 

Equal (pseudo-)random inputs are realized by equally initialised Linear Feed- 
back Shift Registers (LFSR) or a Cyclic Redundancy Code (CRC). Different 
(secret) initial weights can either be fixed (device-specific), or they can be pro- 
vided by an additional application-specific device or by a thermal noise device. 
The synchronisation criterion (Eq. (JSJ) basically comprises a counter. 

The proposed Tree Parity Machine Rekeying Architectures (TPMRAs) are 
functionally separated into two main structures. One structure essentially com- 
prises the Key Handshake and Bit Package Control. The other structure con- 
tains the TPM Unit and its control state machine. 

4.1 Key Handshake and Bit Package Control 

As described in Section 12.21 we implemented the bit package generalisation of the 
protocol (cf. H|). The overall structure of the TPMRAs is shown in Fig. [2] It 
consists of three functional blocks: a Key Handshake and Bit Package Control, 
the TPM unit and a Watchdog timer. 

The Watchdog timer supervises the number of interactions needed for a key- 
exchange between two parties (Eq. ©). If there is no synchronisation within a 
specific time (remember that the synchronisation time is distributed), a signal 
(sync_error) indicates a synchronisation error. It is programmable for variable 
average synchronisation times subject to the chosen TPM structure. 
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Figure 2: Basic diagram of the Tree Parity Machine Rekeying Architectures. 

The Key Handshake and Bit Package Control handles the key transmission 
with an encryption unit and the bit package exchange process with the other 
party. It accomplishes the bit packaging by partitioning the parity bits from the 
TPM unit in tighter bit slices. Due to different computation cycles between two 
key exchange parties, the rekeying procedure employs a key request (req_key), 
a key changed (key_cha) and a key commit (key_com) handshake protocol (see 
Fig. A key is handed over via the internal bus (Key) to an encryption unit 
when the synchronisation process is finished. For our application domain in 
embedded system environments, we choose a fixed bit package length of 32 bit 
for physically parallel exchange and synchronisation over a 32 bit wide bus (Bit 
Package). The bit package exchange process uses a simple request/acknowledge 
handshake protocol (BP_ack, BP_req). 

4.2 Tree Parity Machine Unit 

The TPM unit comprises the logic for the TPM structure, such as the logic for 
calculating the parity bits as explained in Section [21 It consists of the TPM 
control, a Cyclic Redundancy Code (CRC) generator, a Parity Computation 
unit and a Weight Adjustment unit. A register bank holds the data for the 
hidden unit and the weights of the network as shown in Fig. |3 

The TPM control is realized as simple finite state machine (FSM) which 
executes the initialisation of the TPM and the learning process with the bit 
package from the other party. The Parity Computation unit calculates the 
summation and the parity bit (Eq. Q and[21). The weight adjustment unit 
accomplishes the learning rule (Eq. J3J and (QJ). 

The CRC random generator generates the pseudo random bits for the inputs 
of the TPM. It is initialised by a vector which is equal for both parties. For the 
purpose of authentication, the initial value would have to be kept secret. 
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Figure 3: Internal structure of the Tree Parity Machine Unit. 

5 Implementation and Performance 

We designed and simulated parameterisable, serial and semi-parallel TPM Rekey- 
ing Architectures, using VHDL to implement an FPGA- and an ASIC-realisation. 
For both architectures we appoint the integer range L to 4, as explained in 
Section |31 In the serial architecture, the synaptic summation is performed by 
Time Devision Multiple Access (TDMA) of an L-bit adder, while the semi- 
parallel form uses TDMA of six L-bit adders in parallel. The details of the 
TPMRA implementations (key length K ■ N ■ L, serial or semi/fully-parallel re- 
alisation) must be chosen with respect to the target environment, including the 
used parameters, the timing, the available channel capacity and the available 
chip-area, of course. 

We realized a two party prototype system based on two XCV300E-8 (Virtex 
E) FPGAs from Xilinx in order to investigate and demonstrate the functionality 
of our architectures. Standard cell ASIC prototype realisations were built to 
verify the suitability of the TPMRAs for typical embedded system components. 
We chose K = 3 and varied N up to 49 for a resulting key size of 3 • 49 • 4 = 588 
bit. This choice for N already allows a remarkable key length and still keeps 
the average synchronisation time low (cf. [S]). The underlying process was 
a 0.18/x six-layer CMOS process with 1.8F supply voltage based on the UMC 
library. The design was synthesised using the Synopsys Design- Compiler and 
was mapped using Cadence Silicon Ensemble. 

The area (Fig. of the TPMRA realisations scale approximately linear 
(around one square-millimetre) due to the linear complexity of the adders. The 
serial TPM realisation consumes less area (i.e. less hardware resources). Note, 
that most of the area is consumed by the bit packaging, because of the necessary 
storage of the inputs for the learning (cf. Section 

Obviously, the achievable clock speed (Fig. |3J}) in the serial variant is sig- 
nificantly higher than in the semi-parallel version. This is due to the necessity 
of a longer clock tree for the additional registers to store partial results. 

Additionally, we established the throughput (i.e. keys per second) subject 
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Figure 4: Post-synthesis results for chip-area (logic) (a) and achievable clock- 
frequency (b) vs. key length. Average key exchange rate (avg. synchronisation 
time of 400 iterations) vs. key length is plotted in (c). A practically finite 
channel capacity is neglected here. Plot (d) is log-scaled and shows average key 
exchange rates for a 588 bit key and a selection of typical channels with their 
capacities. All data refers to a UMC 0.18 micron six-layer standard cell process. 



to the average synchronisation time of 400 iterations for different key lengths in 
Fig. Efc. We assumed the maximally achievable clock frequency with regard to 
each key length, which can be achieved by Digital Phase Lock Loop (DPLL), re- 
gardless of the systems clock frequency. Furthermore, we appointed the average 
synchronisation time of 400 iterations for all key lengths, although it is really 
always less than the size of the key (a worst-case scenario). This data refers to 
an idealised infinite channel bandwidth, neglecting the transmission delay. For 
key lengths smaller than approximately 180 bit, the serial TPMRA has a higher 
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throughput (in the range of 2.5 • 10 4 to more than 4 • 10 4 keys per second) due 
to the higher clock frequency (Fig. 0Jj). Beyond this point, the semi-parallel 
version achieves a higher throughput, exploiting the parallel computation. 

Figure 0Ji shows the same information, but for three real communication 
channels and their bandwidths, given a key length of 588 bit. The chosen log- 
scale allows to see the small difference regarding the throughput (up to around 
1000 keys per second) for an I 2 C and CAN-bus. Only for buses of higher 
bandwidth such as the PC/-bus, the two architectures show a significantly 
different throughput (reaching the fcffz-range). In the case of an 32 bit PCI- 
bus in burst mode, the theoretical maximum throughput (as in Fig. ^i) can be 
achieved. We also considered other bus systems (e.g. packet based systems like 
WLAN). The results are similar, due to their small bandwidth in comparison to 
the PCT-bus. Obviously, the bottleneck is the underlying communication-bus, 
as it is also typical in other domains (processor-bus-bottleneck). Given a high- 
speed communication channel, the proposed key exchange and rekeying in the 
fcffz-range allows us to use rather weak encryption algorithms (cf. Section[3J|, as 
the security may rely on fast rekeying. Of course, any other more sophisticated 
encryption algorithm like AES or 3-DES can also be used. 

The achievable average key-exchange rates of the TPMRAs in the /cHz-range, 
allow to increase the security through a feasible frequent key exchange. Short 
key lifetimes can be realized efficiently. Also, any successful online attack must 
at least achieve the same performance, requiring significant hardware expenses. 
This does not appear to be feasable. Using different keys for encryption and 
transmission of different blocks of data, increases the difficulty for an attack on 
the encrypted data. 

Due to the small area in the range of one square-millimetre, we regard the 
field of application principally as an IP-core in embedded system environments. 
A particular focus can be smartcards or transponder-based applications such as 
RFID-systems and devices in ad-hoc networks , in which a small area for 
cryptographic components is mandatory. 

6 Summary and Outlook 

We presented a solution for secure communication in embedded system envi- 
ronments via Tree Parity Machine Rekeying Architectures. Our investigations 
confirm the results as presented in ||| and stress the advantages of a hardware 
implementation. The silicon area lies within a square-millimetre and allows to 
exchange keys of practical size within about a millisecond. The proposed ex- 
change in parallel to encryption-transmission-decryption also allows for efficient 
rekeying schemes and short key lifetimes. 

Next to algorithmic extensions to further increase the security |21U22llT5ll23| . 
architectural improvements or variants include a fully serial realisation with 
TDM A usage of a single TPM unit. This further decreases the area consumption 
but at the cost of an increase in necessary cycles for one output bit. A stream 
cipher variant, using output bits directly via the Blum- Blum- Shub bit generator, 
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was suggested already in [5] and its implementation in hardware is particularly 
suited for streaming applications. The relatively small size of the TPMRAs 
allows an implementation in embedded systems with only small overhead. They 
are especially suited for devices of limited resources and even more in moderate 
security scenarios. Consequently, the integration of our architectures into such 
a system and its practical evaluation is subject to future work. 
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